Discarding impossible events from statistical language models

نویسندگان

  • Armelle Brun
  • David Langlois
  • Kamel Smaïli
  • Jean Paul Haton
چکیده

This paper describes a method for detecting impossible bigrams from a space of V 2 bigrams where V is the size of the vocabulary. The idea is to discard all the ungrammatical events which are impossible in a well written text and consequently to expect an improvement of the language model. We expect also, in speech recognition, to reduce the complexity of the search algorithm by making less comparisons. To achieve that, we extract the impossible bigrams by using automatic rules. These rules are based on grammatical classes. The biclass associations which are ungrammatical are detected and all the corresponding bigrams are analyzed and set as possible or impossible events. As, in natural language, grammatical rules can have exceptions, we decided to manage for each of the retrieved rules an exception list.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Model Adaptation for Automatic Speech Recognition and Statistical Machine Translation

Language modeling is critical and indispensable for many natural language applications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore statistical techniques have been dominant for language modeling over the last few decades. All statisti...

متن کامل

Statistical Model Checking for Cyber-Physical Systems

Statistical Model Checking is useful in situations where it is either inconvenient or impossible to build a concise representation of the global transition relation. This happens frequently with cyberphysical systems: Two examples are verifying Stateflow-Simulink models and in reasoning about biochemical reactions in Systems Biology. The main problem with Statistical Model Checking is caused by...

متن کامل

Analyzing the function of Quranic language from the viewpoint of Alame Tabatabie

realm of Quranic language, which from among Alame Tabatabiechr('39')s is the most comprehensive. He believes that the Quranic language is a mixture of various languages. The language of some of the Quranchr('39')s propositions is declarative and describes objective events – both tangible and intangible; five groups of Quranic verses are as stated below: Naturalistic verses: describe natural e...

متن کامل

Using Sentence-Level LSTM Language Models for Script Inference

There is a small but growing body of research on statistical scripts, models of event sequences that allow probabilistic inference of implicit events from documents. These systems operate on structured verb-argument events produced by an NLP pipeline. We compare these systems with recent Recurrent Neural Net models that directly operate on raw tokens to predict sentences, finding the latter to ...

متن کامل

State Space Realization Theorems For Data Mining

In this paper, we consider formal series associated with events, profiles derived from events, and statistical models that make predictions about events. We prove theorems about realizations for these formal series using the language and tools of Hopf algebras.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000